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1.1.  Introduction 

The  research  which  was  suggested  in  the  proposal  dealt  with  the 
application  of  quasi-Newton  methods  to  optimal  control  problems.  The  main 
motivation  consisted  of  the  fact  that  these  methods  are  very  useful  for 
optimization  problems  and  exhibit  a  super linear  rate  of  convergence.  This 
statement  on  the  convergence  rate  was  known  to  hold  in  infinite-dimensional 
spaces  only  under  additional  assumptions.  Optimal  control  problems  were 
formulated  in  infinite-dimensional  spaces  and  hence  the  super  linear 
convergence  behavior  of  quasi-Newton  methods  for  these  problems  should  be 
investigated. 


1.2  Classical  Quasi-Newton  Methods 

Optimal  control  problems  of  the  following  type  were  considered:  Let 
L:  Rn+m+1  ir  and  f :  Rn+m+1  *n  for  some  n,  m  e  R  and  T  >  0,  xq  e  Rn. 

Minimize 


subject  to 


t 


L(x(t),  u(t).  t)  dt 


x(t)  =  f(x(t),  u( t ) ,  t)  ,  x(0)  =  xq  . 
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(1) 

(2) 


If  it  is  assumed  that  (2)  is  uniquely ' solvable  for  all  iterates  u,  then  the 
objective  can  be  written  completely  in  terms  of  u 


AIK  F.  "z  '  .  ’  " z  -  -  “L  vv.Aw' i  1  A 

r  ii  :  i  ■  :  v‘  an 

'  rjvi  ;  -■  '  -j  \vv  i:r; 
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r  ■ 

."Mcf ,  ... !  Tni'c run*. io*:  Division 
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'  «y*  *■ 
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T 

F(u)  =  f  L(x(u,t),  u(  t) ,  t)dt. 
Jn 


The  gradient  of  F  is  given  by 


vF(u)  =  p  (-)fu(x(-),  u( • ) ,  •)  +  Lu(x(-),  (4) 


where  p  solves  the  adjoint  equation 


-p(t)  ■  p(t)  f (x{t) ,u(t) ,t)  +  L  (x(t) ,u(t) ,t) ,  (5) 

A  A 

p(T)  =  0. 


The  computation  of  the  Hessian  of  F  is  obviously  even  more  complex  so  that  the 
use  of  Quasi-Newton  methods  which  do  not  require  the  knowledge  of  the  Hessian 
is  a  desirable  choice.  Let  F:  H  -»  IR,  H  Hilbert  space,  be  twice 
Frechet-differentiable  and  e  H  and  «  L(H) ,  the  space  of  linear  and 
bounded  operates  on  H,  be  given.  Then  the  BFGS  method,  can  be  defined  as 
follows: 


(i)  Solve  B^s^  =  -  vF(u^) 

(11>  Vi  *  ui  *  si 

Yi  *  yP<“l> 


(iii)  BU1  =  Bx  + 


<yr> 


<B.s. , •> 

l  l 


.  +  -  y .  -  -  B. 

x  <Yi'si>  *  <BjsT7sT>  i 


B.s. . 

l  l 


It  was  shown  in  [1]  that  for  the  control  problem  the  superlinear  rate  of 


convergence  in  the  Hilbert  space  norm 


. .  t  -  :  t.  ion/ 
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holds,  if  B  and  u  are  chosen  close  enough  to  v  F(u.)  and  u,  and  if,  in 
o  o 

addition 


B  *  H  +  C, 
o  uu 


(8) 


where  C  is  a  compact  operator  and 


Huuv  =  (P.M  fUU(x*(‘,'u*(')''>  +  IVju{x,(-).u,(.).*))v(-). 


Otherwise,  one  can  expect  at  most  a  linear  rate  of  convergence. 

Obviously,  the  control  problem  (1),  (2)  cannot  be  solved  numerically 
unless  it  is  discretized.  However,  one  might  suspect,  that  the  compactness 
condition  (8)  on  the  initial  guess  of  the  approximation  of  the  Hessian 
influences  also  the  convergence  behavior  of  the  discretized  finite-dimensional 
problem.  In  [1]  a  fourth  order  Runge-Kutta  scheme  was  applied  with  a  Hermite 
interpolation  at  intermediate  points.  For  the  approximation  of  the  inner 
product  a  composite  Simpson's  rule  was  used.  Hence  the  discretized  problem 
looked  as  follcws: 

9 

For  given  solve  (*1)  and  obtain  x*  e  R^+1. 

Then  use  (5)  to  compute  p^  e  R^n+* 
at  the  grid  points. 


and  evaluate  (4) 


(9) 


5 


This  procedure  and  the  other  possible  route  to  discretize  (1)  and  substitute 
it  into  (2)  are  quite  different.  In  the  latter  case  one  obtains  very 
complicated  expressions  for  the  gradient  whereas  the  approach  outlined  in  (9) 
is  much  easier  to  apply.  However,  (9)  does  not  yield,  in  general,  the 
gradient  of  a  functional,  so  that  the  Jacobian  is  not  symmetric  and  an 
application  of  the  BFGS  method  seems  not  desirable  because  it  maintains 
symmetry  of  the  approximating  matrices.  But  it  was  shown  in  [1]  that  this 
does  not  give  rise  to  problems  which  is  due  to  the  fact  that  the  finite 
dimensional  problem  is  close  to  an  infinite  dimensional  problem  which  exhibits 
all  the  features  like  self-adjointness  and  positive  definiteness. 

As  a  measure  for  the  convergence  in  [1]  the  first  iteration  index  was 
used  for  which  the  tolerance  was  achieved: 

i(t)  «  min{i  e  «:  llvF(u.)M  <  e) 
i^fc)  =  min{i  €  W:  hvFn(u^) »n  <  t). 

Under  appropriate  conditions  one  can  show,  that  for  N  large  enough 

i(*J  -  1  <  iN(t)  <  i(fc).  (10) 

i.e.  the  termination  criterion  for  the  finite  dimensional  problems  is 

asymptotically  at  the  same  iterate  satisfied  when  it  holds  for  the  underlying 

infinite  dimensional  problem.  Inequality  (10)  has  been  verified  for  numerical 

examples,  but  these  tests  were  even  more  revealing  with  regard  to  the  rate  of 

convergence.  In  Fig.  1  a  graph  shows  the  decrease  in  the  norm  of  the  residual 
N 

iirFjjtUjJlijj  for  two  choices  of  Bq:  For  choice  a)  the  compactness  condition  (8) 
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was  satisfied,  for  choice  b)  it  was  not  true.  Both  choices  had  approximately 

the  same  distance  from  the  Jacobian.  Although  these  two  different  initial 

selections  for  B  seem  not  to  make  a  difference  for  the  finite  dimensional 
o 

convergence  behavior,  Fig.  1  illustrates  the  fundamentally  different 
convergence  rate.  This  effect  can  be  explained  and  even  predicted  by  making 
use  of  the  infinite  dimensional  theory. 

1.3.  Polntwise  Approach 

In  [2]  the  same  problem  (1)  and  (2)  was  considered  but  the  approach 
of  applying  quasi-Newton  methods  to  it  was  quite  different.  Suppose,  the 
necessary  optimality  conditions  (2),  {4)  with  vF(u)  =  0  and  (5)  are  treated  as 
a  system  of  equationsin  the  unknowns  (p,x,u) .  For  u  =  m  =  1  this  means  that 
one  wants  to  solve 


F(p,x,u)  = 


x  -  f(x,u,t) 

p  +  fx(x,u,t)p  +  Lx(x,u,t) 
fu(x,u,t)p  +  Lu(x,u,t) 


The  Jacobian  has  the  following  structure  in  a  proper  function  space: 


F’  (p,x,li)  = 


0  D-f 


where  D  = 


X 

U 

D+f 

H 

H 

X 

XX 

ux 

f  , 

H 

H 

u  1 

XU 

uu 

operator , 

H(x,u,t)  =  f(x,u,t)p(t)  +  L(x,u, t) , 


and 

P:  X  Y  with  1  <  r  <  « 
r  r 

X  =W1'r[0,T]  x  ,r[0,T]  x  Lr[0,T], 
r 

Y  »  Lr[0(T]  x  Lr[0,T]  x  Lr[0,T]. 
r 

Under  a  second  order  sufficiency  condition  for  optimality  one  can  shew  the 

regularity  of  F*  and  hence  prove  the  quadratic  rate  of  convergence  for 

Newton's  method.  If  a  quasi-Newton  method  is  designed  for  this  problem  one 

should  take  into  account  as  much  structure  as  possible.  For  example,  the 

terms  D,  f  and  f  in  F'  are  known  exactly  because  they  are  needed  for  the 
xu 

evaluation  of  F.  Hence  it  is  only  necessary  to  update  the  lower  right  2x2 
block  of  F'  in  (11).  All  the  operators  in  this  block  are  multiplication 
operators  so  that  an  update  should  be  performed  with  multiplication  operators. 
This  leads  to  updating  the  2x 2-block  for  each  t  separately.  In  a  di  scretized 
version  this  means  that  if  u  is  replaced  by  a  D-dimensional  vector,  then  D 
2x 2-blocks  need  to  be  updated.  This  can  be  a  decisive  advantage  over  the 
method  presented  in  (1)  where  a  D  x  D-matrix  need  to  be  updated,  if  D  is  large 
for  the  corresponding  computing  environment.  Another  perk  of  the  pointwise 
updates  is  that  at  each  step  a  linear  two  point  boundary  value  problem  needs 
to  be  solved,  whereas  in  <1]  the  nonlinear  state  equation  has  to  be  solved  at 
each  iteration.  The  pointwise  quasi-Newton  method  is  described  and  analyzed 

I 

in  detail  in  [2]  and  has  been  also  very  successful  for  the  numerical  example. 
An  interesting  result  is  the  rate  of  convergence  which  could  be  shown  for  the 
methods:  with  z  =  (p,x,u)  the  following  holds  for  all 
1  <  s  <  r  <  » 


This  is  not  the  superlinear  convergence  rate  (7)  because  different  norms  are 
used  for  z^+1  -  z*  and  z^  -  z, ,  but  these  two  norms  can  be  arbtr&r ily  close 
This  result  (12)  comes  from  the  fact,  that  the  system  of  nonlinear  equations 
F(p,x,u)  =  0  contains  algebraic  equations.  This  approach  to  ^lve  op  tin'll 
control  problems  shows  a  lot  of  potential  for  extension  in  direction  of 
constraints  and  control  of  partial  differential  equations. 


In  [3]  a  first  approach  towards  optimal  control  problems  with 
partial  differential  equations  is  undertaken.  A  model  from  heat  conduction 
with  memory  was  taken  and  a  boundary  control  appi ied).  The  differential 
equation  which  is  of  pseudoparabolic  type  looks  as  follows: 


*t  =  yxx  +  eyxtx 
y(0,x)  =  o 

yx(t,i)  =  o 

Yx(t.O)  -  *Yxt(t,0) 


x  «  (0,1),  t  e  ( 0 , T ) 
x  6  (0,1) 
t  e  (0,T] 


t  •=  (0,T:  . 


Here  t  >  0  is  a  material  constant  and  u  is  the  control  function.  The  solution 
of  the  differential  equation  can  be  represented  by  a  Fourier  expansion  and  is 
approximated  by  the  first  N  terms  of  the  series  expansion.  The  theory 
developed  in  [1]  can  be  applied  to  approximations  of  this  problem  ard  all  -th£ 
assumptions  can  be  verified. 


In  the  case  that  the  objective  functional  is  of  an  integral  type. 


r1 


I  (y(T,x)  -  z{x))pdx  +  J  [  u(t)2dt 
Jn  ^  Jn 


with  z  €  C[0,1]  and  a  >  0,  pew,  then  one  can  verify  that  the  only 
nonlinearity  is  described  by  a  real-valued  function  depending  on  real  numbers 
Based  on  the  secant  method  one  can  construct  an  update  which  allows  the  same 
error  estimates  as  the  secant  method  and  this  yields  a  convergence  rate  of 
R-order  (V~5  +  l)/2.  Several  numerical  examples  in  [3]  illustrate  that  this 
rate  is  actually  observable.  Similarly,  a  number  of  test  runs  were  made  for 
decreasing  values  of  the  constant  a  in  the  cost  function.  Then  the  problem 
became  less  and  less  well  conditioned  which  resulted  in  a  larger  numbers  of 
iterations  to  achieve  the  tolerance. 

1.5  Elliptic  Boundary  Value  Problems 

Nonlinear  elliptic  boundary  value  problems  are  considered  in  [4J. 
Obviously,  a  discretization  of  this  problem  leads  to  a  Jacobian  with  a  large 
amount  of  sparsity.  This  is  an  advantage  which  should  be  used  in  the  design 
of  quasi-Newton  methods.  If  one  applies  the  Schubert  algorithm  to  this 
problem,  then  from  the  tridiagonal  structure  of  the  Jacobian  the  update  is 
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given  by  [4,  1.12].  By  letting  the  discretization  parameters  tend  to  the 
zero,  one  obtains  as  a  continuous  version. 


(13) 


(Bi+1v)(x)  =  (B..v){x)  +  (yi  -  Bj.si)  (x)v(x)  (si(x) )  +  , 


+ 

a 


F(u) 


F(u.+1)  -  F(u.) 


f 


if  a  *  0 


if  a  =  0 


=  v  u  +  f  (x,u,vu) . 


(14) 


If  one  starts  with  an  operator  Bq  which  includes  the  Laplacian,  then  the 
update  (13)  contains  only  multiplication  operators.  In  the  case  where  (14) 
does  not  depend  on  vu  this  approach  works  find  but  otherwise  the  Jacobian  is 
of  the  form  • 


2 

F' (u)  »  v  +  f2(x,U,vu)  +  f3(x,u,'cm)v, 

which  contains  a  derivative  term  of  first  order.  This  term  is  approximated  by 
multiplication  operators  only  and  leads  to  problems  in  the  numerical 
performance  of  Schubert's  method.  The  remedy  to  this  problem  is  to  update  B 
pointwise  also  for  a  derivative  terra: 

(B1+1v)(x)  = 

(B 1v)(x)  +  (y^  -  91si) (x) (s(x)v(x)  +  vs(x)vv(x) ) (s(x) 2  +  vs(x)2)+. 

The  fact  that  derivative  terms  are  accounted  for  also  shows  in  the  convergence 
result: 


.■jfjltA'A.'A 


-  ^  -  -h  - 


=  0  with  il  ■  li  the  C  -norm 


nu.  -  UJI 


The  numerical  results  include  nonlinear  two-dimensional  elliptic  problems. 
Also,  the  updates  in  this  piper  are  pointwise  quasi -Newton  updates. 
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